Skip to content

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Sep 18, 2025

this is so much simpler than codifying all the of the bits describing all of the CPU surface area! what a breath of fresh air!

except for the EFER bit indicators, the feature selection here is the intersection of "PPR says it's there", "useful for guests", "supported in byhve/propolis", and "doesn't seem like we're painted into a corner if a future platform changes it." the bits here are a subset of what what I'd seen on a 9365 in a Cosmo, but I've not tried booting guests with this specific definition (yet).

I included the EFER bits here because it looks like bhyve/propolis would let those writes through. In which case if we don't want to advertise the features (and I'm not sure if anyone's actually using UAI) then we should fix bhyve/propolis to disallow them too. AutoIBRS at least has more obvious uses; Linux uses it, illumos uses it, presumably Windows would but the easiest way to check IMO is to boot it while offering the bit and see what EFER turns out to be.

This "just" needs the typical slate of guest OSes tested. I think we'll end up passing AutoIBRS through, hide UAI, and at least be set on the Nexus side.

Comment on lines +609 to +730
// Cache topology leaves are otherwise left zeroed; if we can avoid getting
// into it, let's try!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've not actually tried to boot guests with this yet)

Copy link
Member Author

@iximeow iximeow Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux, Windows, OmniOS, and FreeBSD all come up with this profile. They all seem functional, and on Linux /proc/cpuinfo just simply does not have cache size information. Seems viable, unless applications turn out to be deeply unhappy here.

@iximeow iximeow marked this pull request as draft September 18, 2025 02:21
Comment on lines +642 to +646
// level 6
let mut zmmhi_state = ExtendedState::empty();
zmmhi_state.set_size(0x200);
zmmhi_state.set_offset(0x380);
leaves.push(Some(zmmhi_state.into_leaf()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this subleaf is present on the host (cosmo-paris) with this description, but not in the guest? from cpuid:

Extended state for 256-bit AVX YMM_Hi128 requires 256 bytes, offset 576
Extended state for 512-bit AVX OpMask requires 64-bytes, offset 832
Extended state for 512-bit AVX ZMM_Hi256 requires 512 bytes, offset 896

the last line is this subleaf. subleaf 7 is not shown on the host or in the guest, but in the guest only the first two lines are present. I think this is a bug in cpuid handling the empty subleaves 3 and 4.

The enumeration of bit fields supported in XCR0 is correct (x87, SSE, YMM_Hi128, AVX-512 mask registers, ZMM_Hi256, ZMM_Hi16), and the sizes for supported/enabled features is correct (2432 bytes on the guest and host).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a cpuid bug, not an OS or CPUID profile problem. https://github.com/tycho/cpuid/blob/master/handlers.c#L1057 takes popcount of the leaf presence bits and iterates up to that, which means if there are gaps in the leaf D subleaves it will stop iterating early. since MPX is deprecated, those bits are clear. on hardware the bits here are 0x2e7 with three bits' gap, so iteration stops at subleaf 7 (ZMM_Hi256). in a guest the bits are 0xe7, so iteration stops one level earlier and misses that leaf too.

this is just an incorrect way to iterate the subleaves. i'm gonna send a PR to cpuid to fix it, but i'll be shocked if any OS does this (since, again, MPX is deprecated and this would be an issue upon contact with Zen 5)

@iximeow iximeow force-pushed the ixi/turin-cpu-platform branch from 0f8bd29 to f34e225 Compare October 9, 2025 22:29
Comment on lines +680 to +681
.set_extended_processor_and_feature_identifiers(Some(leaf))
.expect("can set leaf 8000_0001h");
Copy link
Member Author

@iximeow iximeow Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a truly distressing case of the ankle bone being connected to the wrist bone, if PerfCtrExtCore is set and TopologyExtensions is not, Windows Server 2022 sits in a loop at boot. I noticed this in checking out a fix for oxidecomputer/propolis#959, an initial version of which just cleared TopologyExtensions bit to match discarding leaf 0x8000_001E. Both bits together are fine. Having topology extensions and not six perf counters (as we've had on Milan for a while) is fine. Having neither is fine. Having six perf counters and no topology extensions does a loop at boot.

I'm a little suspicious there's some relationship between this and the incomplete representation of SMT, so I'm going to set this to a more Milan-like situation where we hide perf counter extensions for now, and omit topology extensions, and then see how this looks with issues like oxidecomputer/propolis#940 sorted out.

edit: these bits are now both cleared, and boy will I feel silly if I've overlooked something here

@iximeow iximeow marked this pull request as ready for review October 11, 2025 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants